Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add resume command and support saving the argument state. #3508

Merged
merged 50 commits into from
Oct 12, 2023

Conversation

ryfu-msft
Copy link
Contributor

@ryfu-msft ryfu-msft commented Aug 7, 2023

Related to:
#3165

Changes:

  1. Add a new Resume command that takes in a --resume-id; -g argument that points to a resume state.

  2. Created a Checkpoint Index that creates/interacts with the following tables:

  • CheckpointMetadataTable (Responsible for tracking important information about a resume i.e. command, args, clientVersion)
  • CheckpointContextTable (Records the checkpoint and all relevant context data. Will be extended in a future PR to handle more complex context data)
  1. The resume command will create a brand new context and initialize the CheckpointManager with the provided --resume-id argument. The CheckpointManager will be utilized to interface with the checkpointIndex and retrieve information for calling the correct command.
  2. Added a Resume function that an individual command can override if they have command-specific steps for resuming.

Tests:
Add unit tests to verify the following scenarios:

  • Invalid resume id
  • Guid resume state does not exist
  • Resume state is empty and does not have any values.
  • Successful install should clean up index.
  • Failed install should not clean up index.

Added E2E tests to verify:

  • Successful install cleans up checkpoint index after completion.
  • Failed install leaves behind a checkpoint save file that can be called using the resume command to replicate the same outcome.
Microsoft Reviewers: codeflow:open?pullrequest=#3508

@ryfu-msft ryfu-msft requested a review from a team as a code owner August 7, 2023 18:10
Copy link
Member

@JohnMcPMS JohnMcPMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably talk again.

src/AppInstallerCLICore/Argument.cpp Outdated Show resolved Hide resolved
src/AppInstallerCLICore/CheckpointManager.cpp Outdated Show resolved Hide resolved
src/AppInstallerCLICore/CheckpointManager.cpp Outdated Show resolved Hide resolved
src/AppInstallerCLICore/CheckpointManager.cpp Outdated Show resolved Hide resolved
src/AppInstallerCLICore/CheckpointManager.cpp Outdated Show resolved Hide resolved
src/AppInstallerCLICore/CheckpointManager.cpp Outdated Show resolved Hide resolved
src/AppInstallerCLICore/Commands/InstallCommand.cpp Outdated Show resolved Hide resolved
@microsoft-github-policy-service microsoft-github-policy-service bot added the Needs-Author-Feedback Issue needs attention from issue or PR author label Aug 7, 2023
Copy link
Contributor

@Trenly Trenly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be extended in a future PR to handle more complex context data

I presume this relates to sub-contexts, multi-installs, etc?

src/AppInstallerCLICore/Argument.cpp Outdated Show resolved Hide resolved
@@ -345,6 +349,8 @@ namespace AppInstaller::CLI
return Argument{ type, Resource::String::DownloadDirectoryArgumentDescription, ArgumentType::Standard, Argument::Visibility::Help, false };
case Args::Type::InstallerType:
return Argument{ type, Resource::String::InstallerTypeArgumentDescription, ArgumentType::Standard, Argument::Visibility::Help, false };
case Args::Type::ResumeGuid:
return Argument{ type, Resource::String::ResumeGuidArgumentDescription, ArgumentType::Standard, true };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to understand the design of this feature a bit more; What was the reason for making this a standard argument and not a positional argument? What happens if the argument is not provided?

src/AppInstallerCLICore/Command.cpp Show resolved Hide resolved

checkpointManager.Checkpoint(resumeContext, Execution::CheckpointFlag::CommandArguments);

commandToResume->Execute(resumeContext);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the command to be resumed is actually calling the command itself, I presume this already handles a change in state on the machine? i.e - User started a command with experimental feature for dependencies enabled, first dependency spawned a reboot, upon reboot, dependencies experimental feature was magically disabled by some script that ran during the reboot, and then the winget resume happens; Would it fail due to a change in state? Proceed but just not install further dependencies? What if an argument was used originally that was behind an experimental feature and then was disabled? I think that would cause it to error out and show the "need to enable experimental feature for this", but would that also remove the resume from the checkpoint index?

src/AppInstallerCommonCore/Public/AppInstallerRuntime.h Outdated Show resolved Hide resolved
using namespace SQLite;
using namespace std::string_view_literals;

static constexpr std::string_view s_CheckpointArgumentsTable_Table_Name = "CheckpointArguments"sv;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having a column for each argument type, wouldn't it make more sense to have an ArgumentName table which associates a unique numerical ID to each argument type, then use a many-to-one architecture with dynamic typing for storing the arguments? It would reduce the amount of space needed in memory to not have to have empty columns for every argument, especially when arguments aren't supported by certain commands

image

@microsoft-github-policy-service microsoft-github-policy-service bot added the No-Recent-Activity Issue has no recent activity label Aug 14, 2023
@microsoft-github-policy-service
Copy link
Contributor

Hello @ryfu-msft,

This pull request has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 7 days. It will be closed if no further activity occurs within 7 days of this comment.

Template: msftbot/noRecentActivity

@denelon denelon removed Needs-Author-Feedback Issue needs attention from issue or PR author No-Recent-Activity Issue has no recent activity labels Aug 15, 2023
@github-actions

This comment has been minimized.

@ryfu-msft
Copy link
Contributor Author

/azp run

@github-actions

This comment has been minimized.

@ryfu-msft ryfu-msft requested a review from JohnMcPMS September 12, 2023 18:23
.github/actions/spelling/allow.txt Outdated Show resolved Hide resolved
}

// Sets a single value for the a data type.
void Set(T dataType, std::string value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should force all fields to have a name considered. The caller can still use empty string as the field name, although I would definitely make a comment on that PR.

src/AppInstallerCLICore/Checkpoint.h Outdated Show resolved Hide resolved
}

// Gets a single value for a data type.
std::string Get(T dataType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field naming comment from above applies here as well. I'm assuming that this is due to using unique enum values per field in the automatic data case. I think it better to have those pass in empty strings as they are special cases rather than expose the standard use cases to "should I name the field or not?".


// A representation of a row in the Checkpoint table.
template <typename T>
struct Checkpoint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in the repository core, created by the database object. It can have a templated function to retrieve one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to repository core

src/AppInstallerCLICore/Commands/ResumeCommand.cpp Outdated Show resolved Hide resolved

StatementBuilder createTableBuilder;
createTableBuilder.CreateTable(s_CheckpointTable_Table_Name).BeginColumns();
createTableBuilder.Column(ColumnBuilder(s_CheckpointTable_Name_Column, Type::Text).Unique());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any time that you are going to use the row id as a foreign key in another table, you must specify it explicitly as a column or it can be changed on you when the table is rebalanced. See the index code for the way that you do it (it has to be one of a few specific names and with specific properties).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed Guid and just made the resume id a string. Also removed the argument validation so that it simply checks if we can find the checkpoint database with the provided resume id string.

createTableBuilder.Column(ColumnBuilder(s_CheckpointDataTable_ContextData_Column, Type::Int));
createTableBuilder.Column(ColumnBuilder(s_CheckpointDataTable_Name_Column, Type::Text));
createTableBuilder.Column(ColumnBuilder(s_CheckpointDataTable_Value_Column, Type::Text));
createTableBuilder.Column(ColumnBuilder(s_CheckpointDataTable_Index_Column, Type::Int));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Table should probably have a primary key on { checkpoint, data, name, index }. Also, I don't think you want to allow nulls in any of the columns except for value, and then I think it would be better to have the design be that you simply don't insert the value rather than putting in a null one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to suggested

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@ryfu-msft
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ryfu-msft
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

}
else
{
THROW_HR_IF(ERROR_CANNOT_MAKE, !std::filesystem::is_directory(checkpointsDirectory));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
THROW_HR_IF(ERROR_CANNOT_MAKE, !std::filesystem::is_directory(checkpointsDirectory));
THROW_HR_IF(MAKE_HRESULT_FROM_WIN32(ERROR_CANNOT_MAKE), !std::filesystem::is_directory(checkpointsDirectory));


std::filesystem::path CheckpointManager::GetCheckpointDatabasePath(const std::string_view& resumeId, bool createCheckpointDirectory)
{

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extre newline

auto checkpointIds = m_checkpointDatabase->GetCheckpointIds();

// Remove the last checkpoint (automatic)
checkpointIds.pop_back();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In GetAutomaticCheckpoint you check for empty, but not here.

}

const auto& checkpointDatabaseParentDirectory = checkpointDatabasePath.parent_path();
if (std::filesystem::remove(checkpointDatabaseParentDirectory, error))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I've left this comment before, but why not simply delete the directory with a recursive delete method?

AICLI_LOG(CLI, Info, << "Resuming command: " << checkpointCommand);
std::unique_ptr<Command> commandToResume = FindCommandToResume(checkpointCommand);

for (const auto& fieldName : automaticCheckpoint.GetFieldNames(AutomaticCheckpointData::Arguments))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future maintainability, it would be better if this code that reads the values out were next to the code that writes the values in. Even if that means a standalone helper, I would much rather have them in the same file (and even have the functions next to each other).

@@ -2581,4 +2581,27 @@ Please specify one of them using the --source option to proceed.</value>
<data name="WINGET_CONFIG_ERROR_UNIT_SETTING_CONFIG_ROOT" xml:space="preserve">
<value>A unit contains a setting that requires the config root.</value>
</data>
<data name="ResumeCommandLongDescription" xml:space="preserve">
<value>Resumes execution of a previously saved command by passing in the unique guid identifier of the saved command. This is used to resume an executed command that may have been terminated due to a reboot.</value>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<value>Resumes execution of a previously saved command by passing in the unique guid identifier of the saved command. This is used to resume an executed command that may have been terminated due to a reboot.</value>
<value>Resumes execution of a previously saved command by passing in the unique identifier of the saved command. This is used to resume an executed command that may have been terminated due to a reboot.</value>

void Command::Resume(Execution::Context& context) const
{
context.Reporter.Error() << Resource::String::CommandDoesNotSupportResumeMessage << std::endl;
THROW_HR(E_NOTIMPL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terminate the context rather than throwing I think...

@@ -31,6 +31,7 @@ namespace AppInstaller::Runtime
constexpr std::string_view s_PortablePackageRoot = "WinGet"sv;
constexpr std::string_view s_PortablePackagesDirectory = "Packages"sv;
constexpr std::string_view s_LinksDirectory = "Links"sv;
constexpr std::string_view s_CheckpointsDirectory = "Checkpoints"sv;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point I think we will need to have something going through and cleaning out old resume context data. Doesn't need to be now.

@@ -266,4 +266,6 @@ namespace AppInstaller::Utility
// Converts the given boolean value to a string.
std::string_view ConvertBoolToString(bool value);

// Converts the given GUID value to a string.
std::string ConvertGuidToString(GUID value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::string ConvertGuidToString(GUID value);
std::string ConvertGuidToString(const GUID& value);

@ryfu-msft ryfu-msft merged commit 85951ae into microsoft:master Oct 12, 2023
@ryfu-msft ryfu-msft deleted the resilience branch October 12, 2023 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants